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(54) Document managing apparatus, data compressing method, and data decompressing 
method 



(57) In a document managing apparatus for forming 
a compressed document file where a keyword retrieval 
is available, a first process operation and a second proc- 
ess operation are alternately executed. In the first proc- 
ess operation, when document data to be compressed 
is given, the respective characters are directly written 
into a compressed document file until an end control 
character string appears in this document data. In the 
second process operation, data obtained by coding the 
respective characters are written into the compressed 
document file until a start control character series ap- 



pears in the document data Also, in this document man- 
aging apparatus, a third process operation and a fourth 
process operation are alternately performed, when the 
compressed document file is restored. In the third proc- 
ess operation, the respective data (characters) are di- 
rectly outputted until the end control character string ap- 
pears in the data contained in the compressed docu- 
ment file. In the fourth process operation, the decoding 
operation is carried out until the start control character 
string appears in the restored result of the data con- 
tained in the compressed document file. 
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Description 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention s 

The present invention generally relates to a docu- 
ment managing apparatus, a data compressing method, 
and a data decompressing method. More specifically, 
the present invention is directed to a document manag- 10 
ing apparatus for compressing document data to man- 
age the compressed document data, and to data com- 
pressing/data decompressing methods used to com- 
press/decompressing document data and the like. 

is 

2. Description of the Related Art 

Recently, various sorts of data such as character 
codes, vector information, and image information are 
processed in computers. On the other hand, since 20 
amounts of processed data are rapidly increased, these 
data are compressed so as to reduce transmission time 
and to utilize storage apparatuses in higher efficiency. 

For instance, in an application software called as an 
■archiver", a single compressed data file is formed from 2s 
more than one file. Files having low use frequencies and 
old files are compressed by employing the archiver, so 
that capacities of these files may be reduced. Then, 
when contents of files are communicated, if the com- 
pressed data file formed by the archiver is used, then 30 
time required for this data file communication may be 
shortened and further communication cost may be low- 
ered. 

Also, drive units such as hard disk units and floppy 
disk units may be operated as compression drive units. 35 
In systems having compressed drive units, when a user 
issues a file writing instruction, this file is automatically 
compressed and the compressed file is stored in the 
compression drive units. Then, when the user issues a 
file reading instruction, the file stored in the compression 40 
drive is automatically restored. 

It should be noted that since various sorts of data 
such as texts, machine languages, images, and voice 
are processed in computer systems, the universal cod- 
ing system corresponding to the coding system applica- 45 
ble to the various sorts of data when the file is com- 
pressed, as described above. Concretely speaking, the 
dictionary coding method for utilizing data (character) 
reproductivity, the arithmetic coding classified to the 
probability/statistical type coding, and the Splay-Tree so 
coding have been utilized. 

On the other hand, as to files which are not com- 
pressed, contents of these non-compressed files can be 
confirmed by performing the keyword retrieval. For ex- 
ample, as to document data formed in the SGML (Stand- 
ard Generalized Markup Language) format, tags are 
used before/after a specific element contained in this 
document data, and the tags correspond to the content 



of this specific element. As a consequence, in the doc- 
ument data formed in the SGML format, the tags at- 
tached to the subject information are retrieved from the 
file, and if the character string stored after this informa- 
tion is read, then the necessary information can be ob- 
tained. 

However, when the document data formed in the 
SGMLformat is compressed, no longer tag retrieval can 
be carried out. Accordingly, even when only a title is 
wanted to be confirmed, the overall compressed file 
should be restored, resulting in lengthy confirmation 
works. 

SUMMARY OF THE INVENTION 

An object of the present invention is to provide a 
document managing apparatus for forming compressed 
document data capable of being retrieved by employing 
a keyword. 

Another object of the present invention is to provide 
a data compressing method for forming compressed da- 
ta capable of being retrieved by using a keyword, and 
also to provide a data decompressing method for de- 
compressing the compressed data formed by this data 
compressing method. 

A first document managing apparatus is comprised 
of a control character string input unit, a coding unit, a 
retrieving unit, and a control unit. The control character 
string storing unit stores more than one starting control 
character string and more than one end control charac- 
ter string. The coding unit encodes a character to there- 
by output coded character data. The retrieving unit re- 
trieves a start control character string and an end control 
character string from a character string made by arrang- 
ing inputted characters. When said start control charac- 
ter string is retrieved by said retrieving unit, the control 
unit commences a process operation such that coded 
character string data produced by coding the inputted 
character string by said coding unit is outputted as an 
element of the compressed document data, and when 
said end control character string is retrieved by said re- 
trieving unit, this control unit commences another proc- 
ess operation such that the inputted character is directly 
outputted as an element of the compressed document 
data without character coding by said coding unit. 

In other word, the first document managing appa- 
ratus forms compressed document data based on the 
document data, in which the non-compressed data is 
mixed with the compressed data. As a consequence, 
the content of the compressed document data formed 
by the first document managing apparatus can be con- 
firmed by performing the keyword retrieval without being 
restored. As a result, the document data can be effec- 
tively managed by the first document managing appa- 
ratus. 

It should be understood that the compressed doc- 
ument data formed by the first document managing ap- 
paratus is restored by a control character string storing 
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unit, a storing unit, a judging unit, and a control unit The 
control character string storing unit stores more than 
one start control character string and more than one end 
control character string. The decoding unit outputs a 
character whose code is decoded. The judging unit s 
judges whether or not a start control character string, or 
an end control character string is present at a tail of a 
restored character string. When said judging unit judges 
the presence of said start control character string, the 
control unit commences a process operation to output 10 
a character produced by decoding the code contained 
in said compressed document data by employing said 
decoding unit, and when said judging unit retrieves said 
end control character string, the control unit commences 
another process operation to directly output said com- is 
pressed document data without decoding by said de- 
coding unit. 

In the first document managing apparatus of the 
present invention, as the coding unit, such a unit is em- 
ployed which outputs a code corresponding to said char- 20 
acter by using a dynamic coding model (e.g., dynamic 
Huffman). Also, as the control unit, such a unit is em- 
ployed which initializes the dynamic coding model used 
by said coding unit when said end control character 
string is retrieved by said retrieving unit. When the doc- 2s 
ument managing apparatus is arranged in this manner, 
it is possible to form the compressed document data in 
which only a portion of this content can be restored. 

Also, in the first document managing apparatus, as 
the control unit, such a unit may be employed, when said 30 
control unit commences the process operation to direct- 
ly output said compressed document data without de- 
coding by said decoding unit, that outputs as an element 
of the compressed document data the end control char- 
acter string retrieved by said retrieving unit. 35 

In the case that the character managing apparatus 
is arranged in this manner, the document element sand- 
wiched between the start control character string and 
the end control character string, present in the docu- 
ment data, is directly stored in the compressed docu- 40 
ment data. Accordingly, the keyword retrieving opera- 
tion for the compressed document data can be further 
executed in accordance with this document managing 
apparatus. 

In the first document managing apparatus of the 45 
present invention, as the control unit, such a unit may 
be employed, when said end control character string is 
retrieved by said retrieving unit, that substitutes the in- 
putted character by utilizing a substitution table for de- 
termining a correspondence relationship between input so 
characters and output characters to thereby output the 
substituted result without coding of said coding unit. 

When the document managing apparatus is ar- 
ranged in this manner, such compressed document data 
that does not contain the directly readable data is ss 
formed. As a result, even when the compressed docu- 
ment data formed by this document managing appara- 
tus is transferred by employing the internet, the content 
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of this data cannot be read by the illegal machines. Ac- 
cordingly, secrecy of the data communication can be im- 
proved by this document managing apparatus. 

It should be noted that when the apparatus is ar- 
ranged by substituting the character to output the sub- 
stituted character, it is preferable to additionally employ 
a substituting unit that substitutes, when an instruction 
is issued to retrieve a certain character string with re- 
spect to the compressed document data, said certain 
character string by using said substitution table; and a 
retrieving unit that executes a retrieval with employment 
of the character string substituted by said substituting 
unit. 

A second document managing apparatus of the 
present invention is comprised of: a display unit, a first 
reading unit, a first outputting unit, a first control unit, a 
second reading unit, a second outputting unit, a second 
control unit, a storing unit, a display control unit, a des- 
ignating unit, a storage position specifying unit, and a 
partially decompressing unit. 

The display unit displays data. The control charac- 
ter string storing unit stores more than one start control 
character string and more than one end control charac- 
ter string. The first reading unit sequentially reads a 
character contained in document data to be com- 
pressed. The first outputting unit directly outputs the 
character read by said first reading unit as an element 
of a compressed document file, and also outputs said 
read character as an element of an index file. The first 
control unit stops the reading operation of said first read- 
ing unit when said first reading unit reads the same char- 
acter string as any of said start control character strings 
stored in said control character string storing unit. 

The second reading unit commences a reading op- 
eration of a character contained in said document data 
when the reading operation of said first reading unit is 
stopped by said first control unit. The second outputting 
unit outputs a code corresponding to the character read 
by said second reading unit as an element of com- 
pressed document data. When second reading unit 
reads the same character string as any of the end con- 
trol character string stored in said control character 
string storing unit, the second control unit stops the 
reading operation by said second reading unit and also 
restarts the reading operation by said first reading unit. 
The storing unit stores said compressed document file 
and said index file. The display control unit displays the 
respective data segmented by said end control charac- 
ter string and contained in said index file stored in said 
storing unit, on said display unit as an index when a pre- 
determined instruction is issued. The designating unit 
designates one index from the indexes displayed by 
said display control unit. The storage position specifying 
unit specifies a storage position of the index designated 
by said designating unit within said compressed docu- 
ment file. The partially decompressing unit restores data 
located subsequent to the storage position specified by 
said storage position specifying unit and stored in said 
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compressed document file until any of the end control 
character strings stored in said control character string 
storing unit is restored. 

In other words, the second document managing ap- 
paratus of the present invention forms such a com- 
pressed document file based upon the document data, 
in which the non-compressed data (data outputted from 
first output unit) is mixed with the compressed data (data 
outputted from second output unit), and further forms 
the index file constructed of the data outputted from the 
first output unit. 

The content of the index file stored in the storing 
unit is displayed by the display control unit on, for in- 
stance, the display unit such as a CRT. The user desig- 
nates one index from a plurality of indexes displayed on 
the display unit by employing the designating unit ar- 
ranged by such an input apparatus as a keyboard and 
a mouse. The storage position specifying unit specifies 
a storage position of the index designated by said des- 
ignating unit within said compressed document file. The 
partially decompressing unit restores data located sub- 
sequent to the storage position specified by said storage 
position specifying unit and stored in said compressed 
document file until any of the end control character 
strings stored in said control character string storing unit 
is restored. 

As described above, in the second document man- 
aging apparatus, since the function for decompressing 
only a portion of the contents of the compressed docu- 
ment file is provided, the contents can be confirmed 
even when the overall compressed document file is not 
restored. As a consequence, in accordance with the 
second document managing apparatus, while the stor- 
age capacity of the storing unit arranged by a hard disk 
unit is effectively utilized, the document data can be 
processed in high efficiency. 

This second document managing apparatus may 
be further comprised of a multiplied size detecting unit 
for detecting a multiplied size of data to store the detect- 
ed multiplied size, said data being outputted as the ele- 
ments of the compressed document data every time 
said first output unit commences to output the elements 
of the compressed document data; and as the storage 
position specifying unit, a unit for specifying the storage 
position of said index within the compressed document 
file based on the multiplied size stored in said multiplied 
size detecting/storing unit. 

Also, the second document managing apparatus 
may be employed with such a partially decompressing 
unit constructed of a restore-not-required data recogniz- 
ing unit, a first data reading unit, a first decoding unit, a 
first read controlling unit, a second data reading unit, a 
second decoding unit, a second reading control unit, 
and a third reading control unit. 

The restore-not-required data recognizing unit rec- 
ognizes that the data located preceding the storage po- 
sition specified by said storage position specifying unit 
and contained in said compressed document file is 



equal to processed data. The first data reading unit se- 
quentially reads unprocessed data contained in said 
compressed document file every one character. The first 
decoding unit outputs the data read by said first data 
s reading unit as a decoded result. The first reading con- 
trol unit stops the reading operation of said first data 
reading unit when said first decoding unit outputs the 
same character string as any of said start control char- 
acter strings stored in said control character string stor- 
ing unit. 

The second data reading unit commences a reading 
operation of the unprocessed data contained in said 
compressed document file when the reading operation 
of said first data reading unit is stopped by said first read- 
ing control unit. The second decoding unit outputs a 
character obtained by decoding the data read by said 
second data reading unit. 

When said second decoding unit outputs the same char- 
acter string as any of the end control character string 
stored in said control character string storing unit, the 
second reading control unit stops the reading operation 
by said second data reading unit. The third reading con- 
trol unit restarts the reading operation of said first data 
reading unit when the control operation is carried out by 
said second reading control unit, in the case that the 
character string read by said second data reading unit 
is not equal to an end control character string corre- 
sponding to a start control character string contained in 
a tail of an index specified by said specifying unit. 

When the partially decompressing unit with the 
above-described arrangement is employed, the data 
within the range corresponding to the index designated 
by the designating unit can be restored. 

A third document managing apparatus of the 
present invention is comprised of a display unit, a first 
reading unit, a first outputting unit, a first control unit, a 
second reading unit, a second outputting unit, a second 
control unit, a multiplied size detecting/storing unit, a 
storing unit, a first display control unit, a designating 
unit, a decoding-not-required data recognizing unit, a 
first data reading unit, a first decoding unit, a first decod- 
ing control unit, a second data reading unit, a second 
decoding unit, a second decoding control unit, and a 
third decoding control unit. 

The display unit displays data. The control charac- 
ter string storing unit stores more than one start control 
character string and more than one end control charac- 
ter string. The first reading unit for sequentially reads a 
character contained in document data to be com- 
pressed. The first outputting unit directly outputs a code 
obtained by statically coding the character read by said 
first reading unit as an element of a compressed docu- 
ment file, and also outputs said read character as an 
element of an index file. The first control unit stops the 
reading operation of said first reading unit when said first 
reading unit reads the same character string as any of 
said start control character strings stored in said control 
character string storing unit. 
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The second reading unit commences a reading opera- 
tion of a character contained in said document data 
when the reading operation of said first reading unit is 
stopped by said first control unit. The second outputting 
unit outputs a code obtained by dynamically coding the 
character read by said second reading unit as an ele- 
ment of compressed document data. When said second 
reading unit reads the same character string as any of 
the end control character string stored in said control 
character string storing unit, the second control unit 
stops the reading operation by said second reading unit, 
initializes a model employed to dynamically encode the 
character by said second output unit, and also restarts 
the reading operation by said first reading unit. The mul- 
tiplied size detecting/storing unit detects a multiplied 
size of data which have been outputted as the elements 
of the compressed document file by said first outputting 
unit and said second outputting unit every time said first 
output unit starts to output said character, and stores 
said detected multiplied size. The storing unit stores 
said compressed document file and said index file. The 
first display control unit displays the respective data seg- 
mented by said start control character string and con- 
tained in said index file stored in said storing unit, on 
said display unit as an index when a predetermined in- 
struction is issued. The designating unit designates one 
index from the indexes displayed by said first display 
control unit. The decoding-not-required data recogniz- 
ing unit specifies a storage position of the index desig- 
nated by said designating unit within said compressed 
document file based on the multiplied size stored in said 
multiplied size detecting/storing unit, and recognizes 
data preceding said designated index within said com- 
pressed document file as processed data. The first data 
reading unit reads unprocessed data contained in said 
compressed document file. The first decoding unit out- 
puts a character obtained by statically decoding the data 
read by said first data reading unit as a decoded result. 
The first decoding control unit stops the reading opera- 
tion of said first data reading unit when said first decod- 
ing unit decodes the same character string as any of 
said start control character strings stored in said control 
character string storing unit. The second data reading 
unit commences a reading operation of unprocessed 
data contained in said compressed document file when 
the reading operation of said first data reading unit is 
stopped by said first decoding control unit. The second 
decoding unit outputs a character obtained by dynami- 
cally decoding the data read by said second data read- 
ing unit. When said second decoding unit decodes the 
same character string as any of the end control charac- 
ter string stored in said control character string storing 
unit, the second control unit stops the reading operation 
by said second data reading unit and also initializes a 
model used to dynamically decode the data by said sec- 
ond decoding unit. The third decoding control unit re- 
starts the reading operation of said first reading unit 
when the control operation is carried out by said second 



decoding control unit, in the case that the character 
string decoded by said second decoding unit is not equal 
to an end control character string corresponding to a 
start control character string contained in a tail of an in- 

5 dex designated by said designating unit. 

In other words, the third document managing appa- 
ratus of the present invention forms such a compressed 
document file based upon the document data, in which 
the statically encoded non-compressed data (data out- 

10 putted from first output unit) is mixed with the dynami- 
cally encoded compressed data (data outputted from 
second output unit), and further forms the index file con- 
structed of the non-compressed data corresponding to 
the compressed document data outputted from the first 

is output unit. 

The content of the index file stored in the- storing 
unit is displayed by the display control unit on, for in- 
stance, the display unit such as a CRT. The user desig- 
nates one index from a plurality of indexes displayed on 

20 the display unit by employing the designating unit ar- 
ranged by such an input apparatus as a keyboard and 
a mouse. 

The decoding-not-required data recognizing unit 
specifies a storage position of the index designated by 

25 said designating unit within said compressed document 
file based on the multiplied size stored in said multiplied 
size detecting/storing unit, and recognizes data preced- 
ing said designated index within said compressed doc- 
ument file as processed data. Then, the process oper- 

30 ations by the respective units are repeated until the end 
control character string is restored which corresponds 
to the start control character string contained in the tail 
of the index designated by the user with respect to the 
data subsequent to the data recognized as the proc- 

35 essed data by this decoding-not-required data recogniz- 
ing unit. 

As described above, since the third document man- 
aging apparatus forms the compressed document file in 
which the document data are compressed by employing 

40 two sorts of compressing methods, the size of the com- 
pressed document file can be made small, and the stor- 
age capacity of the storing unit arranged by the hard disk 
unit or the like can be effectively utilized. Also, since the 
index file capable of being retrieved by the keyword is 

45 formed, the content thereof can be predicted even when 
the compressed document file is not restored. Also, 
since such a function is employed that restores only a 
portion of the contents of the compressed document file, 
only necessary portion can be restored. As a result, the 

50 document data can be effectively processed by the third 
document managing apparatus. 

A data compressing method according to the 
present invent ran may compress original data such that 
start control character strings and end control character 

55 strings have been inserted before/after elements of sev- 
eral data. 

The data compressing method of the present inven- 
tion is comprised of a retrieving step and a data process- 
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ing step. 

In the retrieving step, the start control character 
string and the end control character string are retrieved 
from said original data. In the data processing step, 
when said start control string is retrieved at said retriev- 
ing step, a process operation is commenced to output 
encoded data obtained by coding original data subse- 
quent to the first-mentioned original data; and when said 
end control character string is retrieved in said retrieving 
step, another process operation is commenced to di- 
rectly output the original data subsequent to the first- 
mentioned original data 

As described above, in the data compressing meth- 
od of the present invention, such compressed data that 
the non-compressed data is mixed with the compressed 
-■ data is formed, namely the compressed data retrievable - 
by the keyword is formed. 

The compressed data file formed by this data com- 
pressing method is restored by the following data de- 
compressing method. 

That is, the data decompressing method of the 
present invention is comprised of a judging step and a 
data processing step. 

In the judging step, it is judged as to whether or not 
the start control character string or the end control char- 
acter string is present at a tail of restored data. In the 
data processing step, when the presence of said start 
control character string is judged at said judging step, a 
process operation is commenced to output as a restored 
result a character obtained by decoding compressed 
data subsequent to the first-mentioned compressed da- 
ta. Also, when said end control character string is re- 
trieved in said judging step, another process operation 
is commenced to directly output as the restored result 
the compressed data subsequent to the first-mentioned 
compressed data. 

In the data compressing method of the present in- 
vention, it is also possible to employ as the data 
processing step, such a step that performs the coding 
operation by employing a dynamic coding model, and 
initializes said dynamic coding model when said end 
control character string is retrieved at said retrieving 
step. 

When the compressed data formed by this data 
compressing method is restored, it is also possible to 
employ as the above-described data processing step of 
the data decompressing method, such a step that per- 
forms the coding operation by employing a dynamic 
coding model, and initializes said dynamic coding model 
when said end control character string is retrieved at 
said retrieving step. 

In the data compressing method of the present in- 
vention, it is also possible to employ as the data 
processing step, such a step where when the process 
operation to output the encoded data is commenced, 
said end control character string retrieved at said retriev- 
ing step is outputted as an element of the compressed 
data. 



When the compressed data formed by this data 
compressing method is restored, it is also possible to 
employ as the above-described data processing step of 
the data decompressing method, such a step where 

s when a process operation to output a decoded character 
is commenced, a firstly decoded control character string 
is not handled as the restored result. 

Also, in the data compressing method of the present 
invention, it is possible to employ as the data processing 

10 step, such a step where when said end control character 
string is retrieved in said retrieving step, a process op- 
eration is commenced to output as an element of com- 
pressed data the data obtained by substituting original 
data subsequent to the first-mentioned original data by 

is employing a predetermined substitution table. 

~ ' -When the compressed data formed by this data 

compressing method is restored, it is also possible to 
employ as the above-described data processing step of 
the data decompressing method, such a step where 

20 when said end control character string is retrieved in 
said retrieving step, a process operation is commenced 
to output as the restored data obtained by substituting 
compressed data subsequent to the first-mentioned 
compressed data by employing a predetermined substi- 

25 tution table. 

BRIEF DESCRIPTION OF THE DRAWINGS 

A more complete understanding of the teachings of 
30 the present invention may be acquired by referring to 
the accompanying drawings, in which: 



Fig. 1 is a schematic block diagram for showing an 
arrangement of a document managing apparatus 
35 according to a first embodiment of the present in- 
vention; 

Fig. 2 is a functional block diagram for explaining a 
sequence to form a compressed document file by 
employing the document managing apparatus ac- 
40 cording to the first embodiment of the present in- 
vention; 

Fig. 3 is a flow chart for describing the compressed 
document file forming sequence executed by the 
document managing apparatus of the first embodi- 
es ment of the present invention; 

Fig. 4 represents an example of document data de- 
scribed in the SGML format; 
Fig. 5 schematically shows a compressed docu- 
ment file formed from the document data shown in 
50 Fig. 4 by the document managing apparatus ac- 
cording to the first embodiment of the present in- 
vention; 

Fig. 6 is a functional block diagram for explaining 
decompressing operation by the document manag- 
es ing apparatus according to the first embodiment of 
the present invention; 

Fig. 7 is a flow chart for describing a sequence to 
restore a compressed document file executed by 
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the document managing apparatus according to the 
first embodiment of the present invention; 
Fig. 8 is a functional block diagram for explaining a 
sequence to form a document file by a document 
managing apparatus according to a second embod- s 
iment of the present invention; 
Fig. 9 is a flow chart for describing the sequence to 
form the compressed documentfile executed by the 
document managing apparatus according to the 
second embodiment of the present invention; 10 
Fig. 10 is a functional block diagram for explaining 
a sequence to restore the compressed document 
by the document managing apparatus according to 
the second embodiment of the present invention; 
Fig. 1 1 is a flow chart for describing the sequence 15 
to restore the compressed document file executed 
by the document managing apparatus according to 
the second embodiment of the present invention; 
Fig. 12 is a flow chart for describing a sequence to 
form a compressed document file by an apparatus 20 
according to a third embodiment of the present in- 
vention; 

Fig. 13 schematically indicates the compressed 
document file formed by the document managing 
apparatus according to the third embodiment of the 25 
present invention; 

Fig. 1 4 schematically shows an index file formed by 

the document managing apparatus according to the 

third embodiment of the present invention; 

Fig. 1 5 is a flow chart for describing a process to 30 

restore an index corresponding region executed in 

the document managing apparatus according to the 

third embodiment of the present invention; 

Fig. 16 is a flow chart for describing a sequence to 

form a compressed document file by the apparatus 35 

according to the third embodiment of the present 

invention; 

Fig. 17 is a flow chart for describing an entire de- 
compressing process executed in the document 
managing apparatus according to the third embod- 40 
iment of the present invention; 
Fig. 1 8 is a flow chart for describing a partial decom- 
pressing process executed in the document man- 
aging apparatus according to the third embodiment 
of the present invention; 45 
Fig. 19 is an explanatory diagram for explaining a 
relationship between the index and the region re- 
stored in the index corresponding region decom- 
pressing process; 

Fig. 20 is a flow chart for describing a sequence to so 
form a compressed document file by an apparatus 
according to a fourth embodiment of the present in- 
vention; 

Fig. 21 schematically indicates the compressed 
document file formed by the document managing 55 
apparatus according to the fourth embodiment of 
the present invention; and 
Fig. 22 is a flow chart for describing a partial decom- 



pressing process executed by the document man- 
aging apparatus according to the fourth embodi- 
ment of the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

Before describing various preferred embodiments 
of the present invention, an overview of a description 
format for document data to be managed by a document 
managing apparatus according to the present invention 
will now be summarized. That is, the document manag- 
ing apparatus according to the present invention is di- 
rected to manage document data such that a character 
for controlling a document, and the document are stored 
in the same data. In this specification/ document data 
described in the SGML (Standard Generalized Markup 
Language) format is managed by the document manag- 
ing apparatus according to various embodiments of the 
present invention. Document data described in the 
SGML format corresponds to the international docu- 
ment format standard determined by ISO in 1986 
(IS8879). In document data described in the SGML for- 
mat, a control character string is used to be located be- 
fore/after a specific element contained in document da- 
ta, and this control character string is called as a tag" 
corresponding to a content of this specific element. For 
instance, a starting tag - <TITLE>° is used before an el- 
ement indicative of a document title, and an end tag 
■</TITLE>" is used after this element. 

FIRST DOCUMENT MANAGING APPARATUS 

A document managing apparatus, according to a 
first embodiment of the present invention, produces a 
file in which compressed data and non-compressed da- 
ta are mixed with each other (will be referred to as a 
■compressed document file" hereinafter) when docu- 
ment data is filed. 

In Fig. 1 , there is schematically shown an arrange- 
ment of a document managing apparatus according to 
a first embodiment of the present invention. As indicated 
in this drawing, the document managing apparatus ac- 
cording to the first embodiment is arranged by a storage 
apparatus 11 , an input apparatus 12, a display appara- 
tus 1 3, and a data processing apparatus 1 4. The storage 
apparatus 11 is a so-called "magnetic disk storage ap- 
paratus" for storing a compressed document file and the 
like. The input apparatus 12 is constituted by a key- 
board, a mouse, and a peripheral device thereof. The 
display apparatus 13 is constructed of a CRT (cathode- 
ray tube) and a peripheral device thereof, and is em- 
ployed so as to display thereon a restored result of the 
compressed document file stored in the storage appa- 
ratus 1 1 . 

The data processing apparatus 14 is mainly ar- 
ranged by a CPU (central processing unit), and has a 
function to edit document data. The data processing ap- 
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paratus 1 4 executes a process operation to form a com- 
pressed document file from document data, and another 
process operation to restore a compressed document 
file to document data in response to an instruction is- 
sued from the input apparatus 11 . 

FORMING OPERATION OF COMPRESSED 
DOCUMENT FILE 

Operations of this first document managing appa- 
ratus (namely, data processing apparatus 14) will now 
be explained. 

Referring first to a functional block diagram of Fig. 
2, a description will be made of operation to form a com- 
pressed document file by the data processing apparatus 
14. ~ ■"- ---- 

As indicated in Fig. 2, the data processing appara- 
tus 1 4 is constructed of a switch 1 07, an input character 
string holding unit 103, a first character string holding 
unit 101, and a coding start character string retrieving 
unit 105, which are provided on the side of a terminal 
S2 of this switch 107; and further a second character 
string holding unit 102, a context holding unit 104, and 
a coding end character string retrieving unit 106, which 
are provided on the side of another terminal S1 of this 
switch 107. The first data processing apparatus 14 fur- 
ther includes a code holding unit 1 08, a coding unit 1 09, 
and a code updating unit 110, which are provided on the 
side of the terminal S1 of the switch 107. 

Document data to be compressed is supplied from 
an input terminal 1 30 to the switch 1 07 with respect to 
each of characters. The switch 107 is such a switch for 
outputting the inputted character from any one of the ter- 
minal S1 and the terminal S2. When a compressed doc- 
ument file is commenced to be formed, the switch 107 
outputs a character from the terminal S2. 

First, a description is made of operations of the re- 
spective circuit units in this data processing apparatus 
14 when the switch 107 outputs the character from the 
terminal S2. In such a case that the character is output- 
ted from the terminal S2 of the switch 107, the input 
character string holding unit 103, the first character 
string holding unit 101, and the coding start character 
string retrieving unit 1 05 may function. The character 
derived from the terminal S2 is outputted from an output 
terminal 131 to thereby constitute constructive data of 
the compressed document file, and also is inputted to 
the input character string holding unit 103. 

The input character string holding unit 103 owns a 
function to hold a character string constructed of a 
preselected number (N1) of characters, and updates a 
content of a held character string by the characters sup- 
plied from the terminal S2. That is, in the case that a 
character string arranged by (M<N1) pieces of charac- 
ters is held, if the character is supplied from the terminal 
S2, then the input character holding unit 103 adds the 
supplied character to the end of this character string. In 
the case that another character string constructed of N 1 



pieces of characters is held, if the character is supplied 
from the terminal S2, then the input character string 
holding unit 103 deletes one character from the head 
portion of this character string, and adds the character 
5 derived from the terminal S2 to this end of the character 
string. 

The first character string holding unit 101 holds sev- 
eral coding start character strings (</SECTION>, 
</SUBSECTION> etc.) selected from the end tag. It 
10 should be noted that the maximum value N1 ofthechar- 
acter number of the character strings held by the input 
character string holding unit 103 is equal to the charac- 
ter number of the longest coding start character string 
held in the first character string holding unit 103. 

The coding start character string retrieving unit 1 05 
retrieves as to whether or not there is such a character 
string coincident with any of the coding start character 
strings held in the first character string holding unit 1 01 , 
in a tail of this character string in the input character 
20 string holding unit 103 ever time a new character is in- 
putted in the input character string holding unit 103. 
Then, when there is no character string coincident with 
any of the coding start character strings, the coding start 
character string retrieving unit 105 executes no retriev- 
es jng operation, and waits for an entry of the next charac- 
ter. On the other hand, when the character string coin- 
cident with the coding start character string is present, 
the coding start character string retrieving unit 105 op- 
erates the switch 1 07 to change the data outputted from 
30 the terminal S2 to the terminal S 1 . 

For instance, when such a character string as 
"****</SECTION> B is held in the input character string 
holding unit 103, if a character is supplied from the 
terminal S2, then this character string is updated to be- 
as come "***</SECTION>". As a consequence, the coding 
start character string retrieving unit 105 finds out a cod- 
ing start character string "</SECTION>" from the tail of 
the character string stored in the input character string 
holding unit 103, and then instructs the switch 107 to 
40 change the data destination. Until this time, non-com- 
pressed data is outputted from the output terminal 1 31 . 

Next, a description will now be made of operations 
when a character is outputted from the terminal S1 of 
the switch 107. In this case, the following units of the 
45 first document managing apparatus will function, name- 
ly, the second character string holding unit 102, the con- 
text holding unit 104, the coding end character string 
retrieving unit 1 06, the code holding unit 1 08, the coding 
unit 109, and the code updating unit 110. 
50 The second character string holding unit 102, the 
context holding unit 104, and the coding end character 
string retrieving unit 106 may be operated in similar 
manners to those of the first character string holding unit 
101 , the input character string holding unit 103, and the 
55 coding start character string retrieving unit 1 05, respec- 
tively. 

In other words, the second character string holding 
unit 102 holds several coding end character strings 
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(<SECTION>, <SUBSECTION> etc.) selected from the 
end tag. The context holding unit 104 owns such a ca- 
pability of holding a character string having the same 
length as the longest coding end character string held 
by the second character string holding unit 1 02, and up- 5 
dates the content of the internally held character string 
based on the character supplied from the terminal S1 . 
Also, the context holding unit 104 supplies a character 
string (context) constructed of a preselected number of 
characters provided on the tail side to the code holding 10 
unit 108. 

Every time a new character is inputted into the con- 
text holding unit 104 t the coding end character string 
retrieving unit 106 judges as to whether or not such a 
character string is present at the tail of the character is 
string in the context holding unit 1 02, which is coincident 
with any of the coding end character strings held in the 
second character string holding unit 1 02. In the case that 
there is no character string coincident with the coding 
end character string, the coding end character string re- 20 
trieving unit 106 executes no operation, but waits for an 
entry of the next character. On the other hand, when 
there is such a character string coincident with any of 
the coding end character strings, the coding end char- 
acter string retrieving unit 106 switches the data desti- 2s 
nation of the switch 107 from the terminal S2 to the ter- 
minal S1 . 

The code holding unit 1 08, the coding unit 1 09, and 
the code updating unit 110 dynamically code the char- 
acters sequentially supplied from the terminal S1. The 30 
respective units are operated as follows. 

That is, the code holding unit 1 08 holds a code table 
used in the coding operation with respect to each con- 
text, and refers to the code table corresponding to the 
context notified from the context holding unit 1 04, which 35 
code table will be updated. The coding unit 109 deter- 
mines a code corresponding to the character inputted 
from the terminal SI by employing the code table to be 
searched/updated by the code holding unit 108. Then, 
the coding unit 109 outputs the determined code (com- 40 
pressed data) from the output terminal 131 . This com- 
pressed data is continuously outputted during a time pe- 
riod until the switch 107 is switched from this terminal 
S1 to the other terminal S2. When the character coding 
operation is ended, the code updating unit 110 updates 4S 
the content of the code table used in the character cod- 
ing operation in order to reflect such a fact that the ap- 
pearing frequency of this character is increased to the 
relationship between the character and the code. 

Referring now to Fig. 3 to Fig. 5, a detailed expla- so 
nation will be made of the sequential operation to form 
the compressed document file by the document manag- 
ing apparatus according to the first embodiment of the 
present invention. That is, Fig. 3 is a flow chart for de- 
scribing the sequential operation to form the com- ss 
pressed document file by the data processing apparatus 
1 4. Fig. 4 schematically represents one example of doc- 
ument data to be compressed in this first document 



managing apparatus. Fig. 5 schematically shows a sum- 
mary of the compressed document file formed by this 
first document managing apparatus based upon the 
document data shown in Fig. 4. It is assumed in the be- 
low-mentioned description that "</SECTION>" and 
■</SUBSECTION>" are set as the coding start character 
strings, and further "<SECTION>" and ^SUBSEC- 
TIONS are set as the coding end character strings are 
set. 

To form the compressed document file, a non-com- 
pressed data output process loop for directly outputting 
the respective characters which constitute the docu- 
ment data, and a compressed data output process loop 
for compressing the respective characters to output the 
compressed characters are alternately repeated. As in- 
dicated in Fig. 3, when an instruction is issued to com- 
press the document data, the non-compressed data out- 
put process loop (steps S101 to S103) is performed in 
the data processing apparatus 1 4. 

In this non-compressed data output process loop, 
one character (character to be processed) in the docu- 
ment data is directly outputted from the terminal S2 side 
of Fig. 2 to thereby be written into the compressed doc- 
ument file (step S101 ). Next, a judgment is made as to 
whether or not the process operation is complete with 
respect to all characters for constituting the document 
data (step S102). Then, when the process operation for 
all of the characters is not ended ("ISP at step S102), an- 
other judgment is made as to whether or not the char- 
acter string which has been so far processed is coinci- 
dent with any of the coding start character strings (step 
S103). 

In such a case that the processed character string 
is not coincident with the respective coding start char- 
acter strings ("N" at step S103), the process operation 
defined from the above-described step S101 is again 
executed. On the other hand, when the character string 
which has been so far processed is coincident with one 
of the coding start character strings ("Y" at step S103), 
the compressed data output process loop (steps S1 04 
to S107) is commenced. 

For instance, when the compressed document file 
related to the document data shown in Fig. 4 is formed, 
a firstly appearing coding start character string is 
■</SECTION>" (second line). Asa result, the respective 
characters from the head of this document data until this 
coding start character string "</SECTION>" in the sec- 
ond line are directly outputted so as to be stored in the 
compressed document file. As a consequence, the data 
having the same content as the document data is stored 
in the head portion of the compressed document file, as 
illustrated in Fig. 5. Then, the compressed data output 
process operation is commenced for the character sub- 
sequent to this coding start character string "</SEC- 
TION>\ 

Referring back to Fig. 3, the description of the com- 
pressed document file forming process will now be con- 
tinued. 
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In the compressed data output process loop, next 
one character is read out from the document data on the 
terminal S1 side, and a code corresponding to this sub- 
ject character is outputted from the coding unit (step 

5104) . At this step S104, the corresponding code is out- s 
putted while referring to the context of this character. 
Thereafter, the content of the code table related to the 
context used in the coding operation is updated (step 

5105) . 

Next, a judgment is made as to whether or not the 10 
process operation is complete with respect to all char- 
acters for constituting the document data. Then, when 
the process operation for all of the characters is not end- 
ed ("N" at step S106), another judgment is made as to 
whether or not the character string which is made of sev- *5 
eral characters is coincident with any of the coding end 
character strings (step S107). 

In such a case that the coded character string con- 
structed of the several characters is not coincident with 
all of the coding end character strings ("N" at step S1 07), 20 
the process operation defined from the above-described 
step S104 is again executed. On the other hand, when 
the coded character string which is made of several 
characters is coincident with one of the coding end char- 
acter strings fY" at step S107), the non-compressed 25 
data output process loop (steps S1 01 to S103) is again 
commenced. 

For instance, in the document data shown in Fig. 4, 
a firstly appearing coding end character string after the 
third line is "<SECTION>" (fourth line). As a conse- 30 
que nee, the respective characters from the beginning of 
the third line up to the coding end character string 
■<SECTION>" in the fourth line are encoded to output 
the coded characters. As a result, the respective char- 
acters of this portion are stored as the compressed data 35 
into the compressed document file, as shown in the third 
line of Fig. 5. Then, the process operations defined by 
the non-compressed data output process loop and the 
compressed data output process loop are again repeat- 
ed with respect to a sentence (2. SCOPE OF CLAIM 40 
FOR A PATENT </SECTION>— ) from the next charac- 
ter of "<SECTION>". Eventually, a compressed docu- 
ment file is formed in such a manner that only the portion 
sandwiched by the coding end character string and the 
control character string designated as the coding start *5 
character string is non-compressed, and other portions 
(containing other control character strings such as 
<PARAGRAPH>, <TT>) are compressed. 

This compressed document file forming process is 
accomplished either when the process operation for all so 
of the data is ended in the non-compressed data output 
process loop fY" at step S102) or when the process 
operation for all of the data is ended in the compressed 
data output process loop fY" at step S106). 

With reference to the functional block diagram 55 
shown in Fig. 6, a description will now be made of op- 
erations to restore a compressed document file by the 
document managing apparatus (data processing appa- 



ratus 14) of the first embodiment. 

The data for constituting the compressed document 
file is supplied from an input terminal 230 to a switch 
207. The switch 207 outputs an inputted character from 
any one of the terminal S1 and the terminal S2. 

The following descriptions are made of operations 
of the various units when the switch 207 is operated to 
supply the data to the terminal S2. It should be noted 
that the decompressing operation of the compressed 
document file is commenced under such a condition that 
the data is outputted from the terminal S2 of the switch 
207. 

When the data is supplied from the terminal S2 of 
the switch 207, an input character string holding unit 
203, a first character string holding unit 201, and a de- 
coding start character string retrieving- unit 205 may 
function. The data derived from the terminal S2 of the 
switch 207 is outputted from the output terminal 231 as 
a single character contained in the document data, and 
also is supplied to the input character string holding unit 
203. 

The input character string holding unit 203 holds a 
character string constituted by N1 characters at maxi- 
mum, and updates the content of the internally held 
character strings based on the character supplied from 
the terminal S2. The first character string holding unit 
201 holds the same character string (<SECTION>, 
</SUBSECTION> etc.) as the coding start character 
string held by the first character string holding unit 101 
as a decoding start character string. Every time new da- 
ta (character) is inputted into the input character holding 
unit 203, the decoding start character string retrieving 
unit 205 judges as to whether or not such a character 
string is present at the tail of the character string in the 
input character holding unit 203, which is coincident with 
any of the decoding start character strings held in the 
first character string holding unit 201 . In the case that 
there is no character string coincident with the decoding 
start character string, the decoding start character string 
retrieving unit 205 executes no operation, but waits for 
an entry of the next data. On the other hand, when there 
is such a character string coincident with any of the de- 
coding start character strings, since the subsequent 
character strings are compressed, the decoding proc- 
ess is required. As a consequence, the decoding start 
character string retrieving unit 205 switches the data 
destination of the switch 207 from the terminal S2 to the 
terminal S1. 

Next, operations when the data (code) is outputted 
from the terminal S1 of the switch 207 will now be de- 
scribed. In this case, a code holding unit 208, a decoding 
unit 209, a code updating unit 210, a second character 
string holding unit 202, a context holding unit 204, and 
also a coding end character string retrieving unit 206 
start their functions. 

The code holding unit 208, the decoding unit 209, 
and the code updating unit 210 adaptively decode the 
data (codes) sequentially supplied from the terminal S1 . 
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The respective units are operated as follows. 

That is, the code holding unit 208 holds a code table 
used in the decoding operation with respect to each con- 
text, and refers to the code table corresponding to the 
context notified from a context holding unit 204 (will be s 
discussed later) which code table will be updated. The 
decoding unit 209 decodes the code inputted from the 
terminal S1 by employing the code table to be searched/ 
updated by the code holding unit 208. Then, a character 
corresponding to the decoded result is supplied to the 10 
output terminal 231 and the context holding unit 204. 
After the decoding operation is ended by the decoding 
unit 209, the code updating unit 21 0 updates the content 
of the code table used in the decoding operation in order 
to reflect such a fact that the appearing frequency of this is 

character corresponding to the decoded result is in- 

creased to the relationship between the character and 
the code. 

The context holding unit 204 owns a capability to 
hold a character string constructed of N2 pieces of char- 20 
acters, and updates the content of the character string 
held therein based upon a character supplied from the 
decoding unit 20. The context holding unit 204 supplies 
a character string constituted by a preselected number 
of characters located on the tail side of the held charac- 25 
ter strings to the code holding unit 208 as a context. The 
second character string holding unit 202 holds the same 
character string as the coding end character string held 
by the second character string holding unit 102 (see Fig. 
2) as a decoding end character string. It should be noted 30 
that symbol "N2" indicates the quantity of the characters 
contained in the longest decoding end character string 
held in the second character string holding unit 202. 

The decoding end character string retrieving unit 
206 judges as to whether or not there is a character 35 
string coincident with any of the decoding end character 
strings held in the second character string holding unit 
202 at the tail of the character strings held in the context 
holding unit 204 every time a character is newly inputted 
into the context holding unit 204. Then, where there is 40 
no character string coincident with the decoding end 
character string, the decoding end character string re- 
trieving unit 206 performs no retrieving operation, and 
waits for an entry of the next decode result. On the other 
hand, when there is such a character string coincident 45 
with the decoding end character string, since a charac- 
ter string subsequent to this coincident character string 
corresponds to the non-compressed character string, 
this decoding end character string retrieving unit 206 
switches the data destination of the switch 207 from the so 
terminal S1 to the terminal S2. 

Referring now to Fig. 4 and Fig. 5 which have been 
employed so as to explain the sequential operation to 
form the compressed document file, the sequential op- 
eration to restore the compressed document file by the 55 
document managing apparatus according to the first 
embodiment will now be described. It should be noted 
that Fig. 7 is a flow chart for describing the sequential 



operation to restore the compressed document file by 
the data processing apparatus 14. 

As indicated in Fig. 7, when the decompressing op- 
eration of the compressed document file is firstly in- 
structed, a non-compressed data process loop (steps 
S201 to S203) is executed in the data processing appa- 
ratus 14. In this non-compressed data process loop, da- 
ta about a first one character stored in the compressed 
document file is directly outputted as a restored result 
(step S201). Subsequently, a check is made as to 
whether or not the process operation has been accom- 
plished for all of the data stored in the compressed doc- 
ument file (step 202). Then, when the process operation 
is not yet ended for all of these data ("NT at step S202), 
another check is done as to whether or not a character 
string arranged by several outputted characters is coin- 
cident with any of the decoding start character strings 
(step S203). 

When the character string arranged by the several 
outputted characters is not coincident with each of the 
decoding start character string ("N" at step S203), the 
process operation defined from the step S201 is again 
executed. To the contrary, when the character string ar- 
ranged by the several outputted characters is coincident 
with one of the decoding start character strings fY" at 
step S203), the compressed data process loop (steps 
S204 to S207) is commenced. 

For example, in the case that the compressed doc- 
ument file shown in Fig. 5 is to be processed, a decoding 
start character string firstly found in the non-com- 
pressed data process loop is "</SECTION>" (second 
line). Accordingly, the respective characters up to this 
decoding start character string ■ B </SECTION>" are di- 
rectly outputted, so that the two-line data at the head 
portion of Fig. 4 is produced. The process operation by 
the compressed data process loop is commenced for 
the data subsequent to this decoding start character 
string "</SECTION>". 

Referring back to Fig. 7, the decompressing proc- 
ess operation of the compressed document file will now 
be explained. 

In this compressed document process loop, a nec- 
essary amount of data (codes) of the compressed doc- 
ument file is read, and then characters corresponding 
to the decoded result of the codes are outputted (step 

5204) . It should be understood that this decoding oper- 
ation is carried out, while referring to the character 
strings (context) which have already been decoded. 
Thereafter, the content of the code table related to the 
context used in the decoding operation is updated (step 

5205) . 

Next, a check is made as to whether or not the proc- 
ess operation has been accomplished for all of the data 
stored in the compressed document file (step S206). 
Then, when the process operation is not yet ended for 
all of these data ("N" at step S206), another check is 
done as to whether or not a character string arranged 
by several decoded characters is coincident with any of 
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the decoding end character strings (step S207). 

When the character string arranged by the several 
decoded characters is not coincident with each of the 
decoding end character string ("NT at step S207), the 
process operation defined from the step S204 is again s 
executed. To the contrary, when the character string ar- 
ranged by the several decoded characters is coincident 
with one of the decoding start character strings ("V at 
step S207), the non-compressed data process loop 
(steps S201 to S203) is again performed. 10 

For example, in the case that the compressed data 
from the third line at Fig. 5 are sequentially decoded, 
such a character string "<SECTIOIM>" will be restored 
soon. When the character string coincident with one of 
the decoding end character strings is restored in this ?5 

manner; the data processing apparatus 14 escapes ■- 

from the compressed data process loop, and starts the 
non -compressed data process loop. It should be noted 
that when the process operation for all of the data in the 
non -compressed data process loop is accomplished 20 
fY" at step S202), or when the process operation for all 
of the data in the compressed data process loop is com- 
plete ("Y" at step S206), the data processing apparatus 
14 accomplishes the process operation to restore the 
compressed document file. 25 

As previously described in detail, the compressed 
document file in which a portion of the document data 
has been directly stored without any compression is 
formed based upon the document data in the document 
managing apparatus according to the first embodiment 30 
of the present invention. In other words, the compressed 
document file is formed which can be retrieved based 
on the keyword. Accordingly, in accordance with this first 
document managing apparatus, the content of the com- 
pressed document file can be predicted, or confirmed 35 
without actually decompressing this compressed docu- 
ment file. 

It should also be noted that although this document 
managing apparatus of the first embodiment has been 
arranged as the apparatus capable of managing the 40 
SGML-formatted document data, this first document 
managing apparatus may be utilized as an apparatus 
capable of managing data in other formats (not limited 
to document data) by merely changing control character 
strings stored therein. Also, it is possible to employ not 45 
only the control character strings, but also control char- 
acters. 

On the other hand, in such a case that the com- 
pressed document file managed by the document man- 
aging apparatus according to the first embodiment is re- so 
trieved based on not the tag unit but or ">" corre- 
sponding to a constructive element of the tag, there are 
certain possibilities that a code contained in the com- 
pressed data is retrieved. To avoid such an erroneous 
retrieval, the first document managing apparatus may ss 
own a retrieving function such that when a non-charac- 
ter code exists subsequent to the retrieved character, 
this retrieved character is neglected and then the retriev- 



ing operation is further continued. In order to firmly ac- 
tuate this retrieving mechanism, when "0x3c" (namely, 
ASCII code of "<*) and "0x3e" (namely, ASCII code of 
">") appear in the compressed data for constituting the 
compressed document file, a specific code correspond- 
ing to such a non-ASCII code as B OxOo" may be insert- 
ed subsequent to the first-mentioned codes. It should 
be understood that when this first document managing 
apparatus is arranged in this manner, this specific code 
should be removed during the decompressing operation 
of the compressed document file. 

OPERATIONS OF SECOND DOCUMENT MANAGING 
APPARATUS 



- The above-described document managing appara- 
tus of the first embodiment is such an apparatus for 
forming the compressed document file in which the data 
contained in the document data is directly used as the 
non-compressed data. In contrast, a document manag- 
ing apparatus according to a second embodiment is 
such an apparatus for forming a compressed document 
file which stores therein data which is produced by sub- 
stituting the data contained in the document data in ac- 
cordance with a predetermined rule. The first-men- 
tioned data is not equal to the data contained in this doc- 
ument data. In other words, the second document man- 
aging apparatus forms the compressed document data 
in which the directly readable data is not contained. 
Since sequential operations of the document managing 
apparatus according to the second embodiment mode 
are similar to those of the first embodiment mode, only 
different operations will now be explained. 

First, a sequential operation to form a compressed 
document file by the document managing apparatus ac- 
cording to the second embodiment mode will now be 
explained with reference to Fig. 8 and Fig. 9. Fig. 8 is a 
functional block diagram used to describe the com- 
pressed document file forming sequential operation in 
the document managing apparatus according to the 
second embodiment. Fig. 9 is a flow chart for explaining 
this compressed document file forming sequential oper- 
ation. 

As represented in Fig. 8, data (namely, non-com- 
pressed character to be processed) derived from the ter- 
minal S2 of the switch 107 is supplied to a substituting 
unit 122, and an output from the substituting unit 122 is 
stored in a compressed document file in the document 
managing apparatus according to the second embodi- 
ment. 

To the substituting unit 122, a substituting table 
holding unit 123 is connected which holds a substitution 
table in which characters correspond to substituted 
characters. The substituting unit 122 outputs a charac- 
ter corresponding to the character supplied from the ter- 
minal S2 in this substitution table. 

That is, in the document managing apparatus of the 
second embodiment, as shown in Fig. 9, when a char- 
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acter is outputted in the non-compressed data output 
process loop (steps S301 to S303), the character con- 
tained in the document data is substituted and the sub- 
stituted character is outputted (step S301). 

Therefore, there is no data which can be directly 
read in the compressed document file formed by the 
document managing apparatus according to the second 
embodiment. For instance, a file is transferred through 
a plurality of machines in a relay manner in an internet 
system. When document data is transferred in this com- 
pressed document file format, it is possible to prevent 
the content of this document file from being read by the 
illegal machines. 

It should also be noted that when an instruction is 
issued to retrieve a compressed document file by a key- 
word, the document managing apparatus according to 
the second embodiment is arranged in such a manner 
that the retrieving operation is performed based on such 
a keyword obtained by substituting the first-mentioned 
keyword by using the substitution table. 

Next, a sequential operation to restore a com- 
pressed document file by the document managing ap- 
paratus according to the second embodiment mode will 
now be explained with reference to Fig. 10 and Fig. 11 . 
Fig. 10 is a functional block diagram used to describe 
the compressed document file decompressing sequen- 
tial operation in the document managing apparatus ac- 
cording to the second embodiment. Fig. 11 is a flow 
chart for explaining this compressed document file de- 
compressing sequential operation. 

As represented in Fig. 10, data (namely, character) 
derived from the terminal S2 of the switch 107 is sup- 
plied to an inverse-substituting unit 223, and an output 
from the inverse-substituting unit 222 is added to the 
document data obtained by decompressing a com- 
pressed document file in the document managing appa- 
ratus according to the second embodiment. 

To the inverse-substituting unit 222, an inverse-sub- 
stituting table holding unit 223 is connected which holds 
an inverse-substitution table corresponding to the sub- 
stitution table in the substitution table holding unit 123. 
The inverse-substituting unit 222 outputs a character 
corresponding to the character supplied from the termi- 
nal S2 in this inverse-substitution table. 

That is, in the document managing apparatus of the 
second embodiment, as shown in Fig. 11, in the non- 
compressed data output process loop (steps S401 to 
S403), the character is outputted which is produced by 
inverse-substituting the data (character) contained in 
the non-compressed document file (step S401). 

OPERATIONS OF THIRD DOCUMENT MANAGING 
APPARATUS 

A document managing apparatus according to a 
third embodiment of the present invention is constructed 
based upon the above-described document managing 
apparatus of the third embodiment. It should be noted 



that when a compressed document file in which non- 
compressed data is mixed with compressed data is 
formed in the document managing apparatus according 
to the third embodiment, an index file arranged by only 

5 the non-compressed data is also formed. Furthermore, 
a format of this compressed document file is different 
from that of the compressed document file formed in the 
document managing apparatus of the first embodiment. 
In addition, a unit for performing a decompressing op- 

10 eration by utilizing the index file may be designated in 
the document managing apparatus of the third embod- 
iment. 

Referring now to Fig. 12, a description will be firstly 
made of a compressed document file forming sequence 

1 $ by the document managing apparatus (data processing 

apparatus) according to the third embodiment. ~ 

When a first instruction is issued to compress doc- 
ument data, a non-compressed data output process 
loop (steps S501 to S503) is commenced in the data 

20 processing apparatus. In this non-compressed data out- 
put process loop, one character (character to be proc- 
essed) in the document data is firstly and directly out- 
putted to thereby be written into the compressed docu- 
ment file and the index file (step S501 ). Next, a judgment 

25 js made as to whether or not the process operation is 
complete with respect to all characters for constituting 
the document data (step S502). Then, in such a case 
that characters to be processed are left ("N" at step 
S502), a check is made as to whether or not a character 

30 string which is constructed of the several processed 
characters and contains the processed characters at 
this time is coincident with one of predetermined coding 
start character strings (step S503). 

In such a case that there is no coding start character 

35 string coincident with the processed character strings 
fN" at step S503), the process operation defined from 
the above-described step S501 is again executed. On 
the other hand, when the character string has been 
processed which is coincident with one of the coding 

40 start character strings ("Y" at step S503), a compressed 
data output process loop (steps S504 to S507) is com- 
menced. 

In the compressed data output process loop, a sub- 
sequent one character is read out from the document 

45 data, and then a code corresponding to this subject 
character is outputted (step S504). This code is output- 
ted at this step while referring to a context of this subject 
character. Thereafter, the content of the code table re- 
lated to the context employed in the coding operation is 

so updated (step S505). 

Next, a judgment is made as to whether or not the 
process operation is complete with respect to all char- 
acters for constituting the document data (step S506). 
Then, when characters to be processed are left ("N" at 

55 step S506), another judgment is made as to whether or 
not the character string containing the characters which 
have been so far processed is coincident with any of the 
coding end character strings (step S507). In such a case 
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that the processed character string is not coincident with 
the respective coding end character strings ('N D at step 
S507), the process operation defined from the above- 
described step S504 is again executed. 

On the other hand, when the character string which s 
has been so far processed is coincident with one of the 
coding end character strings ("Y" at step S507), the 
code table is initialized (step S508). Thereafter, the cod- 
ing end character string detected at the step S507 is out- 
putted to the compressed document file and the index 10 
file (step S509), and then the non-compressed data out- 
put process loop (step S501 to S503) is again com- 
menced. 

This compressed document file forming process op- 
eration is accomplished when the detection is made that is 
the process operation for all of the data is complete in - 
the non-compressed data output process loop ("Y" at 
step S502), or when the detection is made that the proc- 
ess operation for all of the data is complete in the com- 
pressed data output process loop ("Y" at step S506). 20 

The compressed document file forming sequential 
operation by the document managing apparatus ac- 
cording to the third embodiment will now be described 
more in detail, while using such an example that the doc- 
ument data shown in Fig. 4 is processed. It is assumed 2s 
in the following description that "</SECTION>" and 
"</SUBSECTION> B are set as the coding start character 
strings, whereas "<SECTION>" and ■<SUBSECTION>" 
are set as the coding end character strings. 

In this case, since the firstly appearing coding start 30 
character string is "</SECTION> ■ (second line), the re- 
spective characters defined from the head data of the 
document data up to "</SECTION>" in the second line 
are processed in the non-compressed data output proc- 
ess loop. Then, the characters succeeding to the coding 35 
start character string ■</SECTION> 1 are processed by 
the compressed data output process loop. After the 
compressed data output process loop is commenced, a 
firstly appearing coding end character string is "<SEC- 
TION>" (fourth line). As a result, the respective charac- 40 
ters defined from the beginning of the third line until the 
coding end character string B <SECTION>" of the fourth 
line are encoded and then the coded characters are out- 
putted. Then, when the coding operation for ■>■ con- 
tained in the coding end character string -<SECTION>' « 
is accomplished, the coding table is initialized, and fur- 
ther the coding end character string '<SECTION> B is 
written into the compressed document file and the index 
file. 

A series of the above-described operation is repeat- so 
edly performed for the respective data stored in the com- 
pressed document file, so that a compressed document 
file and an index file are formed as shown in Fig. 1 3 and 
Fig. 14, respectively, in the document managing appa- 
ratus of the third embodiment. ss 

That is, the compressed document file formed by 
the third document managing apparatus stores the non- 
compressed data produced by adding the coding end 



character string (start tag) to each of the non-com- 
pressed data stored in the compressed document file 
(Fig. 5) formed by the first document managing appara- 
tus. Then, data identical to the non-compressed data 
stored in the compressed document file is stored in the 
index file. When the compressed data output process 
loop is ended, the code table is initialized, so that the 
respective compressed data stored in the compressed 
document file can be solely restored. 

A detailed description will now be made of an index 
corresponding region decompressing process which 
corresponds to such a process operation for decom- 
pressing only a designated content range of the com- 
pressed document file. 

In Fig. 15, there is shown an operation sequence of 
the document managing apparatus (data processing 
apparatus) during the index corresponding range de- 
compressing process. It should be noted that the flow 
operation shown in Fig. 15 is commenced when a pre- 
determined instruction containing specific information 
about document data is issued from a user. 

As represented in this drawing, upon receipt of a 
predetermined instruction issued from the user, the doc- 
ument managing apparatus (data processing appara- 
tus) displays the content of the index file in correspond- 
ence with the document data designated by this instruc- 
tion on the display apparatus (step S601). It should be 
noted that the data processing apparatus displays only 
such data (will be referred to as an "index" hereinafter) 
sandwiched by the starting tag and the end tag and 
stored in the index file on the display apparatus. For ex- 
ample, in the case that the document data correspond- 
ing to the index file shown in Fig. 1 4 has ben designated 
for the process operation, data as represented in Fig. 
16 is displayed on the display apparatus. 

Thereafter, the operation state of the data process- 
ing apparatus is advanced to a state for waiting an in- 
struction issued from the user (step S602). At a step 
S602, the data processing apparatus waits for the proc- 
ess operation for designating the index to be outputted 
on the screen, namely clicking a mouse, and thus, the 
user manipulates the mouse in order to instruct the proc- 
ess operation to be executed by the data processing ap- 
paratus. It should be noted that at this step S602, the 
user may instruct to display the content of other index 
file, or to accomplish the content display of this index 
file. However, in this case, a description is made of only 
such operations that the mouse is clicked under condi- 
tion where the mouse cursor is positioned on any of the 
indexes. 

When the mouse is clicked under such a condition 
that the mouse cursor is located on any one of the in- 
dexes on any one of the indexes fY" at step S602), the 
data processing apparatus recognizes that this index is 
selected, and specifies index data (namely, index sand- 
wiched by the tags) corresponding to the selected index 
with reference to the index file (step S603). 

Then, the data processing apparatus judges wheth- 
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er or not the specified index data is related to "TITLE - . 
If the specified data is related to TITLE" ("Y" at step 
S604), an overall decompressing process is executed 
which corresponds to a process for decompressing ail 
of the contents of the compressed document file corre- s 
spending to the subject document data (step S605). 
Then, the restored result is displayed or stored as a file, 
so that this process operation is ended. 

Fig. 1 7 represents an operation sequence of the da- 
ta processing apparatus when the overall decompress- io 
ing process operation is carried out. It should be noted 
that this process operation is also performed when an 
instruction is issued to restore the compressed docu- 
ment file. 

As indicated in Fig. 17, during the overall decom- is 
pressing process-operation, a non-compressed data 
process loop (steps S701 to S703) is executed in the 
data processing apparatus. When the non-compressed 
data process loop is performed, the data processing ap- 
paratus first outputs data about a first one character 20 
stored in the compressed document file as the restored 
result (step S701). Next, a judgment is made as to 
whether or not the process operation is complete with 
respect to all data in the compressed document file (step 

5702) . Then, when data to be processed are left ("N" at 2s 
step S702), another check is done as to whether or not 
the processed character string (containing characters 
processed at this time) is coincident with any of the de- 
coding start character strings (step S703). 

In such a case that the processed character string 30 
is not coincident with the respective decoding start char- 
acter strings ("N" at step S703), the process operation 
defined from the above-described step S701 is again 
executed. On the other hand, when the character string 
which has been so far processed is coincident with one 35 
of the decoding start character strings ("Y" at step 

5703) , the compressed data output process loop (steps 
S704 to S707) is commenced. 

In the compressed document process loop, the data 
processing apparatus reads a necessary amount of da- 40 
ta (codes) of the compressed document file, and then 
outputs characters corresponding to the decoded result 
of the codes (step 5704). It should be understood that 
this decoding operation is carried out, while referring to 
the character strings (context) which have already been 45 
decoded. Thereafter, the content of the code table re- 
lated to the context used in the decoding operation is 
updated (step S705). The data processing apparatus 
checks as to whether or not the process operation has 
been accomplished for all of the data stored in the com- so 
pressed document file (step S706). Then, when the data 
to be processed are left ("N" at step S706), a check is 
done as to whether or not the decoded character string 
is coincident with any of the decoding end character 
string (step S707). Then, if the decoded character string ss 
is not coincident with any of the decoding end character 
strings ("N" at step 5707), the data processing appara- 
tus commences the process operation defined from the 



step S704. On the other hand, when the decoded char- 
acter string is coincident with one of the decoding end 
character strings ("Y" at step S707), the data processing 
apparatus initializes the code table related to all of the 
contexts ("Y" at step S708). Next, the data processing 
apparatus skips reading of the decoding end character 
string existing in the head portion of the data to be sub- 
sequently processed (step S709). In other words, the 
data processing apparatus skips reading of the coding 
end character string which has been added when the 
compressed document file is formed. Thereafter, the da- 
ta processing apparatus commences the non^om- 
pressed data process loop (steps S701 to S703). 

When such an operation has been performed for all 
of the data stored in the compressed document file by 
the data processing apparatus ("Y" at step- S706), the — 
overall decompressing processing operation is com- 
plete. 

Referring back to Fig. 15, the explanation about the 
index corresponding region decompressing process is 
continued. 

When the index data does not correspond to "TI- 
TLE" ("N" at step S604), the data processing apparatus 
acquires (stores) a head tag of this index data as an end 
control character string (step S606). Then, a partially de- 
compressing process is executed which corresponds to 
a process operation for decompressing only data relat- 
ed to the selected index among the content of the com- 
pressed document file (step S607). Then, this partially 
decompressing process is accomplished. 

Fig. 18 is a flow chart for describing operations of 
the data processing apparatus during the execution of 
the partially decompressing process. The overall flow 
operation of this partially decompressing process is the 
same as the overall decompressing process (Fig. 17), 
but only a starting condition and an end condition are 
different from those of the overall decompressing proc- 
ess. Accordingly, in this case, only different process op- 
eration will now be explained. 

In the overall decompressing process operation, 
the decompressing process operation is commenced 
for the head data of the compressed document file. To 
the contrary, in the partially decompressing process op- 
eration, a restore starting position is first specified based 
on the index data (step S800). That is, index data is re- 
trieved in response to an index selected from the com- 
pressed document file, and then a first character of the 
retrieved index data is specified as the restore starting 
position. 

Thereafter, the data from this restore starting posi- 
tion will be processed in a sequence similar to that of 
overall decompressing process. 

In the overall decompressing process, when the 
process operation related to all of the data stored in the 
compressed document file is complete, this decom- 
pressing process is ended. To the contrary, in the par- 
tially decompressing process, after the coding table is 
initialized (step S808), an end judgment is carried out 
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(step S809). Concretely speaking, the data processing 
apparatus judges as to whether or not the decoding end 
character string found at the step S807 is coincident with 
the end control character string stored in this data 
processing apparatus. Then, when there is no coinci- s 
dence ("N" at step S809), the data processing apparatus 
skips reading of a decoding end character string present 
in a portion to be subsequently processed (step S810), 
and commences the non-compressed data processing 
loop. 10 

On the other hand, when the decoding end charac- 
ter string is coincident with the end control character 
string ("Y" at step S809), the end control character string 
is removed from the decompressing result (step S811), 
and then the partially decompressing process is ended. ?5 

Now, a more detailed description will be made of 
the index corresponding region decompressing process 
with reference to such an example that "2. SCOPE OF 
CLAIM FOR A PATEN" of Fig. 14 is designated. 

In this case, since the corresponding index data is 20 
■<SECTION> 2. SCOPE OF CLAIM FOR A PATENT 
</SECTION>", "<SECTION>" is specified as the end 
control character string. Next, the decompressing proc- 
ess is commenced from a first character of a character 
string retrieved from the compressed document file, so 2s 
that "<SECTION> 2. SCOPE OF CLAIM FOR A PAT- 
ENT </SECTION>" is processed in the non-com- 
pressed data process loop. In the first compressed data 
process loop executed thereafter, a decoding end char- 
acter string "<SUBSECTION>" corresponding to the 30 
compressed data stored in the compressed document 
file is restored. However, since this character string is 
not coincident with the end control character string 
"<SECTION>", the data processing apparatus contin- 
ues to restore the compressed document file. Then, 35 
when the compressed data process loop is subsequent- 
ly executed, since "<SECTION>" is restored, the data 
processing apparatus removes this "<SECTION>" from 
the restored result, and then completes the partially de- 
compressing process. That is, the decompressing proc- 40 
ess is carried out for the data portion before the index 
data of "<SECTION> 2. SCOPE OF CLAIM FOR A PAT- 
ENT </SECTION>", and then this partially decompress- 
ing process is ended. 

Eventually, in the index corresponding region de- 45 
compressing process, as schematically shown in Fig. 
1 9, the data stored in the region (region surrounded by 
horizontal rule in Fig. 1 9) corresponding to the selected 
index are restored. In other words, when the index re- 
lated to the "title" is selected, all of the contents are re- so 
stored, whereas when the index of the subsection is se- 
lected, only the data of this selected subsection level is 
restored. Also, when the index of the section level is se- 
lected, all of the data (containing data of subsection lev- 
el) related to this section are restored. ss 

In accordance with the document managing appa- 
ratus of the third embodiment, only a portion of the com- 
pressed document file can be restored in the above-de- 



scribed manner. 

As described above, the document managing ap- 
paratus according to the third embodiment has em- 
ployed such a sequential operation to add the coding 
end character string after outputting the compressed da- 
ta in order to contain the coding end character string 
(starting tag) in the respective non-compressed data 
stored in the compressed document file. However, the 
present invention is not limited to the above<Jescribed 
sequential operation. That is, the third document man- 
aging apparatus may be alternatively arranged in such 
a manner that while several characters among the char- 
acters to be processed are buffered, the coding opera- 
tion is performed for the character which could be de- 
fined as being not equal to a portion of a starting tag. 
The starting tag may be contained in each of the non- 
compressed data stored in the compressed document 
file. It should be understood that when the third docu- 
ment managing apparatus is arranged in the above-de- 
scribed manner, the decoding operation is performed 
with respect to the compressed data stored in the com- 
pressed document file while retrieving the starting tag 
(previously defining a boundary between compressed 
data and non-compressed data). 

ARRANGEMENT/OPERATIONS OF FOURTH 
DOCUMENT MANAGING APPARATUS 

A document managing apparatus according to a 
fourth embodiment of the present invention forms the 
same index file as that of the document managing ap- 
paratus according to the third embodiment. It should be 
understood that the document managing apparatus of 
the fourth embodiment forms a compressed document 
file in which first compression data compressed by using 
a static coding process is mixed with second compres- 
sion data compressed by way of a dynamic coding proc- 
ess. The fourth document managing apparatus forms a 
corresponding region management file functioning as a 
file for defining a relationship between a compressed 
document file and an index file. 

Fig. 20 is a flow chart for indicating a process oper- 
ation to form the compressed document file by the doc- 
ument managing apparatus (data processing appara- 
tus) of the fourth embodiment. It should also be noted 
in this fourth document managing apparatus, "</TI- 
TLE>", "</SECTION>", and "</SUBSECTION>" are giv- 
en as a coding start character string, whereas "<SEC- 
TION>" and "<SUBSECTION>" are given as a coding 
end character string. 

When a first instruction is issued to compress doc- 
ument data, a first compressed data output process loop 
(steps S901 to S903) is commenced in the data 
processing apparatus. In this first compressed data out- 
put process loop, one character (character to be proc- 
essed) in the document data is directly outputted to the 
index file, and also such a code obtained by coding this 
subject character by employing the static code table is 
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written into the compressed document file in the fourth 
data processing apparatus (step S901). At this step 
S901 , the data processing apparatus multiplies a size 
of data outputted to the compressed document file. 

Next, a judgment is made by the data processing s 
apparatus as to whether or not the process operation is 
complete with respect to all characters for constituting 
the document data (step S902). Then, when data (char- 
acters) to be processed are left ("NT at step S902), an- 
other judgment is made as to whether or not the char- 10 
acter string which has been so far processed is coinci- 
dent with any of the predetermined coding start charac- 
ter strings (step S903). 

In such a case that the processed character string 
is not coincident with the coding start character string is 

("N" at step S903), the process operation-defined from 

the above-described step S901 is again executed by the 
data processing apparatus. On the other hand, when the 
character string which has been so far processed is co- 
incident with one of the coding start character strings 20 
( m Y m at step S903), a second compressed data output 
process loop (steps S904 to S907) is commenced by 
this fourth data processing apparatus. 

In the second compressed data output process 
loop, a subsequent one character is read out from the 25 
document data, and then a code corresponding to this 
subject character is outputted in the data processing ap- 
paratus (step S904). This code is outputted at this step 
while referring to a context of this subject character. Al- 
so, this data processing apparatus multiplies the data 30 
size written in the compressed document file at this step 
S904. Thereafter, content of the code table related to 
the context employed in the coding operation is updated 
by the data processing apparatus (step S905). 

Next, the data processing apparatus judges as to 35 
whether or not the process operation is accomplished 
for all of the characters which constitute the document 
data (step S906). Then, when data to be processed are 
left ("N" at step S906), the data processing apparatus 
judges as to whether or not the processed character 40 
string is coincident with one of the predetermined coding 
end character string (step S907). If the processed char- 
acter string is not coincident with any of the coding end 
character strings fN" at step S907), then the data 
processing apparatus again executes the process op- 45 
eration defined after the step S904. On the other hand, 
if the processed character string is coincident with one 
of the coding end character strings ("Y" at step S907), 
the data processing apparatus initializes the code table 
(step S908). so 

Next, the data processing apparatus outputs the 
coding end character string detected at the step S907 
to the index file, and also the code obtained by statically 
coding this character string to the compressed docu- 
ment file (step S909). The data processing apparatus ss 
stores storage position information of the stored static 
code within the compressed document file into the cor- 
responded relationship management file (step S910), 



and this storage position information is the data sizes 
up to the head bit of the static code within the com- 
pressed document file. It should be noted that the data 
processing apparatus defines the storage position infor- 
mation based upon the multiplied result of the data sizes 
which have been so far multiplied, and multiplies this 
multiplied result by the data size of the static code writ- 
ten at the step S909 after the storage position informa- 
tion has been defined. 

Thereafter, the data processing apparatus again 
executes the first compressed data output process loop. 

Then, the data processing apparatus accomplishes 
the compressed document file forming process when a 
detection is made of such a fact that the process oper- 
ation for all of the data has been complete in the first 
compressed data output process loop fY" at^step 
S902), or when another detection is made of such a fact 
that the process operation for all of the data has been 
complete in the second compressed data output proc- 
ess loop fY" at step S906). 

In other words, as schematically indicated in Fig. 
21 , such a compressed document file that the first com- 
pression data (namely, underlined portion in this draw- 
ing) by the static coding operation is mixed with the sec- 
ond compression data by the dynamic coding operation 
is formed in the document managing apparatus accord- 
ing to the fourth embodiment. Then, the corresponding 
relationship management file is formed into which the 
storage position of the head bit of the first compression 
data after the second is stored. 

Next, an explanation will now be made of the index 
corresponding region decompressing process executed 
in the document managing apparatus of the fourth em- 
bodiment. Since an overall flow operation of this index 
corresponding region decompressing process is the 
same as that shown in Fig. 15, an explanation thereof 
is omitted. 

Fig. 22 indicates a flow chart for explaining a par- 
tially decompressing process executed in the fourth 
document managing apparatus. A basic flow operation 
of this partially decompressing process is the same as 
the partially decompressing process of the previously 
described document managing apparatus of the third 
embodiment. As a consequence, only different opera- 
tion steps will now be described. 

In the document managing apparatus according to 
the third embodiment, the decompressing start position 
is specified by retrieving the storage position of the index 
data. To the contrary, in the document managing appa- 
ratus of the fourth embodiment, the decompressing start 
position is specified (step S1000) with reference to the 
corresponding relationship managing file. Concretely 
speaking, the data processing apparatus firstly judges 
that the index data designated by the user corresponds 
to which data stored in the index file. For example, if the 
index data corresponds to M-th data, then the data 
processing apparatus reads out (M-1)th storage posi- 
tion information in the corresponding relationship man- 



17 



33 



EP 0 797 158 A2 



34 



agement file. Then, a position determined by this stor- 
age position information is specified as the decompress- 
ing start position. 

Thereafter, the process operation is continued for 
the data located after the decompressing start position. 
In this fourth document managing apparatus, the decod- 
ing operation is carried out by employing the static code 
table when the process operation related to the index is 
executed. 

In other words, in the loop executed immediately af- 
ter the decompressing start position is specified, a nec- 
essary amount of data is firstly read from the com- 
pressed document file, and a process operation for de- 
coding this data by employing the static code table is 
carried out (step S1001). At a step S1010, the data 
processing apparatus skips reading of the code corre- ~ 
spending to the decoding end character string. 



Claims 

1 . A document managing apparatus for forming com- 
pressed document data in response to an inputted 
character string, 

control character string storing means for stor- 
ing more than one starting control character 
string and more than one end control character 
string; 

coding means for encoding a character to 
thereby output coded character data; 
retrieving means for retrieving start control 
character strings and end control character 
strings from a character string made by arrang- 
ing inputted characters; and 
control means, when a start control character 
string is retrieved by said retrieving means, for 
commencing a process operation such that 
coded character string data produced by en- 
coding the inputted character string by said 
coding means is outputted as an element of the 
compressed document data; and when an end 
control character string is retrieved by said re- 
trieving means, for commencing, after encod- 
ing the end control character string, another 
process operation such that the inputted char- 
acter is directly outputted as an element of the 
compressed document data without character 
encoding by said coding means. 

2. A document managing apparatus for outputting 
document data obtained by decompressing com- 
pressed document data inputted thereinto, com- 
prising: 

control character string storing means for stor- 
ing more than one start control character string 
and more than one end control character string; 



decoding means for outputting a character 
whose code is decoded; 
judging means forjudging whether or not a start 
control character string, or an end control char- 
5 acter string is present at a tail of a restored 

character string; and 

control means, when said judging means judg- 
es the presence of a start control character 
string, for commencing a process operation to 
10 output a character produced by decoding the 

code contained in said compressed document 
data by employing said decoding means; and 
when said judging means retrieves an end con- 
trol character string, for commencing another 
i s process operation to directly output said com- 

' " pressed document data without decoding by 

said decoding means. 

3. A document managing apparatus as claimed in 
20 claim 1 wherein: 

said coding means outputs a code correspond- 
ing to said character by using a dynamic coding 
model; and 

25 said control means initializes the dynamic cod- 

ing model used by said coding means when an 
end control character string is retrieved by said 
retrieving means. 

30 4. A document managing apparatus as claimed in 
claim 1 wherein: 

when said control means commences the 
process operation to directly output said com- 
pressed document data without decoding by said 

35 decoding means, said control means outputs an 
end control character string retrieved by said re- 
trieving means. 

5. A document managing apparatus as claimed in 
40 claim 1 wherein: 

when an end control character string is re- 
trieved by said retrieving means, said control 
means substitutes the inputted character by utiliz- 
ing a substitution table for determining a corre- 
45 spondence relationship between input characters 
and output characters to thereby output the substi- 
tuted result without coding by said coding means. 

6. A document managing apparatus as claimed in 
50 claim 5, further comprising: 

substituting means for substituting, when an in- 
struction is issued to retrieve a certain charac- 
ter string with respect to the compressed doc- 
55 ument data, said certain character string by us- 

ing said substitution table; and 
retrieving means for executing a retrieval with 
employment of the character string substituted 
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by said substituting means. 

A document managing apparatus for managing 
document data in which start control character 
strings and end control character strings have been s 
inserted before/after each of plural document ele- 
ments, and said start control character string and 
said end control character string corresponding to 
a content of said document element, comprising: 

10 

display means for displaying data; 
control character string storing means for stor- 
ing more than one start control character string 
and more than one end control character string; 
first reading means for sequentially reading a is 
character contained in document data to be -•- 
compressed; 

first outputting means for directly outputting the 
character read by said first reading means as 
an element of a compressed document file, and 20 
also for outputting said read character as an el- 
ement of an index file; 

first control means for stopping the reading op- 
eration of said first reading means when said 
first reading means reads the same character 25 
string as any of said start control character 
strings stored in said control character string 
storing means; 

second reading means for commencing a read- 
ing operation of a character contained in said 30 
document data when the reading operation of 
said first reading means is stopped by said first 
control means; 

second outputting means for outputting as an 
element of compressed document data a code 35 
corresponding to the character read by said 
second reading means; 
second control means, when said second read- 
ing means reads the same character string as 
any of the end control character string stored in 40 
said control character string storing means, for 
stopping the reading operation by said second 
reading means and also for restarting the read- 
ing operation by said first reading means; 
storing means for storing said compressed doc- 45 
ument file and said index file; 
display control means for displaying the re- 
spective data segmented by said end control 
character string and contained in said index file 
stored in said storing means, on said display so 
means as an index when a predetermined in- 
struction is issued; 

designating means for designating one index 
from the indexes displayed by said display con- 
trol means; 55 
storage position specifying means for specify- 
ing a storage position of the index designated 
by said designating means within said com- 



pressed document file; and 
partially decompressing means for decom- 
pressing data located subsequent to the stor- 
age position specified by said storage position 
specifying means and stored in said com- 
pressed document file until any of the end con- 
trol character strings stored in said control char- 
acter string storing means is restored. 

8. A document managing apparatus as claimed in 
claim 7 wherein: 

said document managing apparatus further 
comprises: 

multiplied size detecting/storing means for de- 
tecting a multiplied size of data to store the de- 
tected multiplied size, said data being output- 
ted as the elements of the compressed docu- 
ment data every time said first output means 
commences to output the elements of the com- 
pressed document data; and 
said storage position specifying means speci- 
fies the storage position of said index within the 
compressed document file based on the multi- 
plied size stored in said multiplied size detect- 
ing/storing means. 

9. A document managing apparatus as claimed in 
claim 8 wherein: 

said partially decompressing means includes: 
restore-not-required data recognizing means 
for recognizing that the data located preceding 
the storage position specified by said storage 
position specifying means and contained in 
said compressed document file is equal to proc- 
essed data; 

first data reading means for sequentially read- 
ing unprocessed data contained in said com- 
pressed document file every one character; 
first decoding means for outputting the data 
read by said first data reading means as a de- 
coded result; 

first reading control means for stopping the 
reading operation of said first data reading 
means when said first decoding means outputs 
the same character string as any of said start 
control character strings stored in said control 
character string storing means; 
second data reading means for commencing a 
reading operation of the unprocessed data con- 
tained in said compressed document file when 
the reading operation of said first data reading 
means is stopped by said first reading control 
means; 

second decoding means for outputting a char- 
acter obtained by decoding the data read by 
said second data reading means; 
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second reading control means, when said sec- 
ond decoding means outputs the same charac- 
ter string as any of the end control character 
string stored in said control character string 
storing means, for stopping the reading opera- s 
tion by said second data reading means; and 
third reading control means for restarting the 
reading operation of said first data reading 
means when the control operation is carried out 
by said second reading control means, in the 10 
case that the character string read by said sec- 
ond data reading means is not equal to an end 
control character string corresponding to a start 
control character string contained in a tail of an 
index specified by said specifying means. is 

10. A document managing apparatus for managing 
document data in which a start control character 
string and an end control character string have been 
inserted before/after each of plural document ele- 20 
ments, and said start control character string and 
said end control character string correspond to a 
content of said document element, comprising: 

display means for displaying data; 25 
control character string storing means for stor- 
ing more than one start control character string 
and more than one end control character string; 
first reading means for sequentially reading a 
character contained in document data to be 30 
compressed; 

first outputting means for directly outputting a 
code obtained by statically coding the character 
read by said first reading means as an element 
of a compressed document file, and also for 35 
outputting said read character as an element of 
an index file; 

first control means for stopping the reading op- 
eration of said first reading means when said 
first reading means reads the same character 40 
string as any of said start control character 
strings stored in said control character string 
storing means; 

second reading means for commencing a read- 
ing operation of a character contained in said 45 
document data when the reading operation of 
said first reading means is stopped by said first 
control means; 

second outputting means for outputting as an 
element of compressed document data a code so 
obtained by dynamically coding the character 
read by said second reading means; 
second control means, when said second read- 
ing means reads the same character string as 
any of the end control string stored in said con- ss 
trol character string storing means, for stopping 
the reading operation by said second reading 
means, for initializing a model employed to dy- 



namically coding the character by said second 
output means, and also for restarting the read- 
ing operation by said first reading means; 
multiplied size detecting/storing means for de- 
tecting a multiplied size of data which have 
been outputted as the elements of the com- 
pressed document file by said first outputting 
means and said second outputting means eve- 
ry time said first output means starts to output 
said character, and for storing said detected 
multiplied size; 

storing means for storing said compressed doc- 
ument file and said index file; 
first display control means for displaying as an 
index the respective data segmented by said 
start- control character string and contained in 
said index file stored in said storing means, on 
said display means when a predetermined in- 
struction is issued; 

designating means for designating one index 
from the indexes displayed by said first display 
control means; 

decoding-not-required data recognizing means 
for specifying a storage position of the index 
designated by said designating means within 
said compressed document file based on the 
multiplied size stored in said multiplied size de- 
tecting/storing means, and for recognizing data 
preceding said designated index within said 
compressed document file as processed data; 
first data reading means for reading unproc- 
essed data contained in said compressed doc- 
ument file; 

first decoding means for outputting a character 
obtained by statically decoding the data read 
by said first data reading means as a decoded 
result; 

first decoding control means for stopping the 
reading operation of said first data reading 
means when said first decoding means de- 
codes the same character string as any of said 
start control character strings stored in said 
control character string storing means; 
second data reading means for commencing a 
reading operation of unprocessed data con- 
tained in said compressed document file when 
the reading operation of said first data reading 
means is stopped by said first decoding control 
means; 

second decoding means for outputting a char- 
acter obtained by dynamically decoding the da- 
ta read by said second data reading means; 
second decoding control means, when said 
second decoding means decodes the same 
character string as any of the end control string 
stored in said control character string storing 
means, for stopping the reading operation by 
said second data reading means and also for 
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initializing a model used to dynamically decode 
the data by said second decoding means; and 
third decoding control means for restarting the 
reading operation of said first reading means 
when the control operation is carried out by said 
second decoding control means, in the case 
that the character string decoded by said sec- 
ond decoding means is not equal to an end con- 
trol character string corresponding to a start 
control character string contained in a tail of an 
index designated by said designating means. 

11. A data compressing method for compressing origi- 
nal data into which a start control character string 
and an end control character string are inserted, 
comprising: — 

a retrieving step where the start control charac- 
ter string and the end control character string 
are retrieved from said original data; and 
a data processing step where when said start 
control character string is retrieved at said re- 
trieving step, a process operation is com- 
menced to output coded data obtained by cod- 
ing original data subsequent to the first-men- 
tioned original data; and when said end control 
character string is retrieved in said retrieving 
step, another process operation is commenced 
to directly output the original data subsequent 
to the first-mentioned original data. 

12. A data compressing method as claimed in claim 11 
wherein: 

said data processing step performs the cod- 
ing operation by employing a dynamic coding mod- 
el, and initializes said dynamic coding model when 
said end control character string is retrieved at said 
retrieving step. 

13. A data compressing method as claimed in claim 11 
wherein: 

in said data processing step, when the proc- 
ess operation to output the coded data is com- 
menced, said end control character string retrieved 
at said retrieving step is outputted as an element of 
the compressed data. 



character string at a tail thereof is mixed with data 
obtained by coding data having an end control char- 
acter string at a tail thereof, comprising: 

5 a judging step for judging whether or not a start 

control character string or an end control char- 
acter string is present at a tail of decoded data; 
and 

a data processing step where when the pres- 
10 ence of said start control character string is 

judged at said judging step, a process opera- 
tion is commenced to output as a restored re- 
sult a character obtained by decoding com- 
pressed data subsequent to the first-mentioned 
15 compressed data; and when said end control 

character string is retrieved in said judging 

step, another process operation is commenced 
to directly output as the restored result, the 
compressed data subsequent to the first-men- 
20 tioned compressed data. 

16. A data decompressing method as claimed in claim 
15 wherein: 

said data processing step performs the cod- 
25 jng operation by employing a dynamic coding mod- 
el, and initializes said dynamic coding model when 
said end control character string is retrieved at said 
retrieving step. 

30 17. A data decompressing method as claimed in claim 
15 wherein: 

in said data processing step, when a process 
operation to output a decoded character is com- 
menced, a firstly decnoded control character string 

35 is not handled as the restored result. 

18. A data decompressing method as claimed in claim 
15 wherein: 

in said data processing step, when said end 
40 control character string is retrieved in said retrieving 
step, a process operation is commenced to output 
as the restored data obtained by substituting com- 
pressed data subsequent to the first-mentioned 
compressed data by emp toying a predetermined 
45 substitution table. 



14. A data compressing method as claimed in claim 11 
wherein: 

in said data processing step, when said end so 
control character string is retrieved in said retrieving 
step, a process operation is commenced to output 
data obtained by substituting original data subse- 
quent to the first-mentioned original data by em- 
ploying a predetermined substitution table. ss 



15. A data decompressing method for decompressing 
compressed data where data having a start control 
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FIG. 3 
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FIG. 4 
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